Trustworthy Artificial Intelligence

Delft University of Technology

Arie van Deursen
Cynthia C. S. Liem

April 30, 2024

Overview

  • Question: How can we faithfully explain predictions of opaque machine learning models?
  • Methods: Counterfactual Explanations, Algorithmic Recourse, Probabilistic Machine Learning, Energy-Based Models, Conformal Prediction, …
  • Applications: Mostly finance and economics but also images and natural language.
  • Tools: I am a Julia developer and the founder of Taija, an organization for trustworthy AI in Julia.

Counterfactual Explanations

\[ \begin{aligned} \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)} + \lambda {\text{cost}(f(\mathbf{Z}^\prime)) } \} \end{aligned} \]

Counterfactual Explanations (CE) explain how inputs into a model need to change for it to produce different outputs.

Figure 1: Gradient-based counterfactual search.

Altmeyer, Deursen, et al. (2023)

Pick your Poison

Altmeyer et al. (2023)

All of these counterfactuals are valid explanations for the model’s prediction.

Which one would you pick?

Figure 2: Turning a 9 into a 7: Counterfactual explanations for an image classifier produced using Wachter (Wachter, Mittelstadt, and Russell 2017), Schut (Schut et al. 2021) and REVISE (Joshi et al. 2019).

ECCCos from the Black-Box

Key Idea

Use the hybrid objective of joint energy models (JEM) and a model-agnostic penalty for predictive uncertainty: Energy-Constrained (\(\mathcal{E}_{\theta}\)) Conformal (\(\Omega\)) Counterfactuals (ECCCo).

ECCCo objective1:

\[ \begin{aligned} & \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {L_{\text{clf}}(f(\mathbf{Z}^\prime);M_{\theta},\mathbf{y}^+)}+ \lambda_1 {\text{cost}(f(\mathbf{Z}^\prime)) } \\ &+ \lambda_2 \mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)|\mathbf{y}^+) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} \end{aligned} \]

Figure 3: Gradient fields and counterfactual paths for different generators.

Faithul Counterfactuals

Figure 4: Turning a 9 into a 7. ECCCo applied to MLP (a), Ensemble (b), JEM (c), JEM Ensemble (d).

ECCCo generates counterfactuals that

  • faithfully represent model quality (Figure 4).
  • achieve state-of-the-art plausibility (Figure 5).
Figure 5: Results for different generators (from 3 to 5).

Dynamics of Counterfactuals

Spurious Sparks

Taija

Code

The code used to run the analysis for this work is built on top of CounterfactualExplanations.jl.

There is also a corresponding paper, Explaining Black-Box Models through Counterfactuals, which has been published in JuliaCon Proceedings.

Trustworthy AI in Julia: github.com/JuliaTrustworthyAI

Trustworthy AI in Julia: github.com/JuliaTrustworthyAI

References

Altmeyer, Patrick, Arie van Deursen, et al. 2023. “Explaining Black-Box Models Through Counterfactuals.” In Proceedings of the JuliaCon Conferences, 1:130. 1.
Altmeyer, Patrick, Mojtaba Farmanbar, Arie van Deursen, and Cynthia C. S. Liem. 2023. “Faithful Model Explanations Through Energy-Constrained Conformal Counterfactuals.” https://arxiv.org/abs/2312.10648.
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In International Conference on Learning Representations.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” https://arxiv.org/abs/1907.09615.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Stutz, David, Krishnamurthy, Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. 2022. “Learning Optimal Conformal Classifiers.” https://arxiv.org/abs/2110.09192.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.